Spatial Modelling with inlabru
School of Mathematics and Statistics, University of Glasgow
December 15, 2025
Space is inherent to all ecological processes, influencing dynamics such as migration, dispersal, and species interactions. A primary goal in ecology is to understand how these processes shape species distributions and dynamics across space.
Many natural processes take place in space
Large amounts of data collected in space;
Recent technologies facilitate access to complex spatial data sets.
Spatial statistical analysis is often not complex enough
Inaccessible to practitioners
Literature written for statisticians development of methodology often not linked to applications (unrealistic assumptions)
Difficult to apply (unless you are an expert statistician and programmer)
we shall see that inlabru can help with this…
We can distinguish three types of spatial data structures
Areal data
Geostatistical data
Point-referenced data
We can distinguish three types of spatial data structures
Areal data
In areal data our measurements are summarised across a set of discrete, non-overlapping spatial units.
Geostatistical data
Point-referenced data
We can distinguish three types of spatial data structures
Areal data
Geostatistical data
In geostatistical data, measurements a continuous process are taken at a set of fixed locations.
Point-referenced data
We can distinguish three types of spatial data structures
Areal data
Geostatistical data
Point-referenced data
In point-referenced data we measure the locations where events occur (e.g. trees in a forest, earthquakes) and the coordinates of such occurrences are our data.
Imagine we have animal counts in each region. We can model them as Poisson
\[ y_i \sim \mathrm{Poisson}(e^{\eta_i}) \] How do we model the linear predictor \(\eta_i\)?
\[ \eta_i = \beta_0 + \mathbf{x}'\beta + u_i ~~~ u_i \overset{iid}{\sim} N(0,\sigma^2_i) \]
Imagine we have animal counts in each region. We can model them as Poisson
\[ y_i \sim \mathrm{Poisson}(e^{\eta_i}) \] How do we model the linear predictor \(\eta_i\)?
Imagine we have animal counts in each region. We can model them as Poisson
\[ y_i \sim \mathrm{Poisspn}(e^{\eta_i}) \] How do we model the linear predictor \(\eta_i\)?
\[ \eta_i = \beta_0 + \mathbf{x}'\beta + u_i ~~~ u_i \overset{iid}{\sim} N(0,Q^{-1}) \]
An areal process (or lattice process) is a stochastic process defined on a set of regions that form a partition of our region of interest \(D\).
Let \(B_1, \ldots B_m\) be our set of \(m\) distinct regions such that: \[\bigcup\limits_{i=1}^m \hspace{1mm}B_i = D.\]
Here we require that our regions are non-overlapping, with \[B_i \cap B_j = \emptyset.\]
Then our areal process is simply the stochastic process \[\{Z(B_i); i=1,\ldots,m\}.\]
Each of our regions \(B_i\) has a set of other nearby which can be considered neighbours
We might expect that areas have more in common with their neighbours.
Therefore, we can construct dependence structures based on the principle that neighbours are correlated and non-neighbours are uncorrelated.
However, we need to come up with a sensible way of defining what a neighbour is in this context.
There are many different ways to define a region’s neighbours.
The most common ones fall into two main categories - those based on borders, and those based on distance.
Common borders
Assume that regions which share a border on a map are neighbours.
Simple and easy to implement
Treats all borders the same, regardless of length, which can be unrealistic.
Areas very close together are not neighbours if there is even a small gap between them.
Distance-based
Assume that regions which are a within a certain distance of each other area neighbours.
What distance do you choose? How do you decide that?
Where do you measure from? (e.g., nearest border or a central point).
Once we have identified a set of neighbours using our chosen method, we can use this to account for spatial correlation.
We construct a neighbourhood matrix (or proximity matrix), which defines how each of our \(m\) regions relate to each other.
Let \(W\) denote an \(m \times m\) matrix where the \((i,j)\)th entry, \(w_{ij}\) denotes the proximity between regions \(B_i\) and \(B_j\).
Note
The values of this matrix can be discrete (which regions are neighbours) or continuous (how far apart are the regions).
By far the most common approach is to use a binary neighbourhood matrix, \(W\), denoted by
\[ \begin{aligned} w_{ij} &= 1 \hspace{2mm} \mbox{ if areas} (B_i, B_j) \mbox{ are neighbours.}\\ w_{ij} &= 0 \hspace{2mm} \mbox{ otherwise.} \end{aligned} \]
Dependencies structures are described through this spatial weights matrix
Binary matrices are used for their simplicity
One of the most popular CAR approaches to model spatial correlation is the Besag model a.k.a. Intrinsic Conditional Autoregressive (ICAR) model. The conditional distribution for \(u_i\) is
\[ u_i|\mathbf{u}_{-i} \sim N\left(\frac{1}{d_i}\sum_{j\sim i}u_j,\frac{1}{d_i\tau_u}\right) \]
\(\mathbf{u}_{-i} = (u_i,\ldots,u_{i-1},u_{i+1},\ldots,u_n)^T\)
\(\tau_u\) is the precision parameter (inverse variance).
\(d_i\) is the number of neighbours
The mean of \(u_i\) is equivalent to the the mean of the effects over all neighbours, and the precision is proportional to the number of neighbours.
The joint distribution is given by:
\[ \mathbf{u}|\tau_u \sim N\left(0,\frac{1}{\tau_u}Q^{-1}\right), \]
Where \(Q\) denotes the precision matrix defined as
\[ Q_{i,j} = \begin{cases} d_i, & i = j \\ -1, & i \sim j \\ 0, &\text{otherwise} \end{cases} \]
This structure matrix directly defines the neighbourhood structure and is sparse.
The ICAR model accounts only for spatially structured variability and does not include a limiting case where no spatial structure is present.
We typically add an unstructured random effect \(z_i|\tau_v \sim N(0,\tau_{z}^{-1})\)
The resulting model \(v_i = u_i + z_i\) is known as the Besag-York-Mollié model (BYM)
The structured spatial effect is controlled by \(\tau_u\) which control the degree of smoothing:
Higher \(\tau_u\) values lead to stronger smoothing (less spatial variability).
Lower \(\tau_u\) values allow for greater local variation.
In the next example we will illustrate how to fit this model using inlabru.
In This example we model the number of respiratory hospitalisations across Intermediate Zones (IZ) that make up the Greater Glasgow and Clyde health board in Scotland.
In epidemiology, disease risk is usually estimated using Standardized Mortality Ratios (SMR) computed as:
\[ SMR_i = \dfrac{Y_i}{E_i} \]
- Stage 1: We assume the responses are Poisson distributed: \[ \begin{aligned}y_i|\eta_i & \sim \text{Poisson}(E_i\lambda_i)\\\text{log}(\lambda_i) = \color{#FF6B6B}{\boxed{\eta_i}} & = \color{#FF6B6B}{\boxed{\beta_0 + u_i + z_i} }\end{aligned} \]
- Stage 2: \(\eta_i\) is a linear function of three components: an intercept, a spatially structured effect \(u\) and an unstructured iid random effect \(z\):
\[ \eta_i = \beta_0 + \overbrace{u_i + z_i}^{v_i} \]
- Stage 3:
- \(\{\tau_{z},\tau_u\}\): Precision parameters for the random effects
The latent field is \(\mathbf{x}= (\beta_0, \beta_1, u_1, u_2,\ldots, u_n,z_1,...)\), the hyperparameters are \(\boldsymbol{\theta} = (\tau_u,\tau_z)\), and must be given a prior.
inlabru for disease mappingThe Model
\[ \begin{aligned} y_i|\eta_t & \sim \text{Poisson}(E_i\lambda_i)\\ \text{log}(\lambda_i) = \eta_i & = \color{#FF6B6B}{\boxed{\beta_0}} + \color{#FF6B6B}{\boxed{\beta_1\ c_i}} + \color{#FF6B6B}{\boxed{\omega_i}} \end{aligned} \]
The code
# define model component
cmp = ~ Intercept(1) +
space(space, model = "besag", graph = Q) +
iid(space, model = "iid")
# define model predictor
eta = observed ~ Intercept + space + iid
# build the observation model
lik = bru_obs(formula = formula,
family = "poisson",
E = expected,
data = resp_cases)
# fit the model
fit = bru(cmp, lik)